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Abstract 

This article extends the concept of compressed sensing to signals that are not sparse 
in an orthonormal basis but rather in a redundant dictionary. It is shown that a 
matrix, which is a composition of a random matrix of certain type and a deterministic 
dictionary, has small restricted isometry constants. Thus, signals that are sparse with 
respect to the dictionary can be recovered via Basis Pursuit from a small number of 
random measurements. Further, thresholding is investigated as recovery algorithm for 
compressed sensing and conditions are provided that guarantee reconstruction with 
high probability. The different schemes are compared by numerical experiments. 

Key words: compressed sensing, redundant dictionary, sparse approximation, random 
matrix, restricted isometry constants, Basis Pursuit, thresholding, Orthogonal Matching 
Pursuit 

1 Introduction 

Recently there has been a growing interest in recovering sparse signals from their projection 
onto a small number of random vectors [U [5j [HJ [T3l [T9l [20] . The word most often used 
in this context is compressed sensing. It originates from the idea that it is not necessary 
to invest a lot of power into observing the entries of a sparse signal in all coordinates 
when most of them are zero anyway. Rather it should be possible to collect only a small 
number of measurements that still allow for reconstruction. This is potentially useful in 
applications where one cannot afford to collect or transmit a lot of measurements but has 
rich resources at the decoder. 

Until now the theory of compressed sensing has only been developed for classes of signals 
that have a very sparse representation in an orthonormal basis (ONB). This is a rather 
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stringent restriction. Indeed, allowing the signal to be sparse with respect to a redundant 
dictionary adds a lot of flexibility and significantly extends the range of applicability. 
Already the use of two ONBs instead of just one dramatically increases the class of signals 
that can be modelled in this way. A more practical example would be a dictionary made 
up of damped sinusoids which is used for NMR spectroscopy, see [12]. 

Before we can go into further explanations about the scope of this paper it is neces- 
sary to provide some background information. The basic problem in compressed sensing 
is to determine the minimal number n of linear non-adaptive measurements that allows 
for (stable) reconstruction of a signal x € M. d that has at most S non-zero components. 
Additionally, one requires that this task can be performed reasonably fast. Each of the n 
measurements can be written as an inner product of the sparse signal x 6 W d with a vector 
in M. d . To simplify the notation we store all the vectors as rows in a matrix \I/ 6 W nxd and 
all the measurements in the ra-dimensional vector s = 

A naive approach to the problem of recovering x from s consists in solving the £o 
minimization problem 



where rj is the expected noise on the measurements, || • ||o counts the number of non-zero 
entries of x and || • H2 denotes the standard Euclidean norm. Although there are simple 
recovery conditions available, the above approach is not reasonable in practice because its 
solution is NP-hard [71 ITS]. 

In order to avoid this severe drawback there have been basically two approaches pro- 
posed in the signal recovery community. The first is using greedy algorithms like Threshold- 
ing [H] or (Orthogonal) Matching Pursuit (OMP) [16\ 121], Thresholding simply calculates 
the inner products of the signal with all atoms, finds the ones with largest absolute values 
and then calculates the orthogonal projection onto the span of the corresponding atoms. 
OMP works iteratively by picking the atoms in a greedy fashion. In each step it finds 
the atom with highest absolute inner product with the residual and adds it to the already 
found atoms. Then it calculates a new approximant by projecting the signal on the linear 
span of the already found atoms and a new residual by subtracting the approximant from 
the signal, cp. Table [TJ 

The second approach is the Basis Pursuit (BP) principle. Instead of considering (i-fa) 
one solves its convex relaxation 



where ||x||i = \ x i\ denotes the £i-norm. This can be done via linear programming in 
the real case and via cone programming in the complex case. Clearly, one hopes that the 
solutions of (Pq) and (Pi) coincide, see (6j |9] for details. 

Both approaches pose certain requirements on the matrix \I/ in order to ensure recovery 
success. Recently, Candes, Romberg and Tao [H[5] observed that successful recovery by BP 



(Po) 



min ||x||o subject to ||s — 9x\\2 < T), 



(Pi) 



min ||x||i subject to \\s — 9x\\2 < T), 
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is guaranteed whenever \I/ obeys a uniform uncertainty principle. Essentially this means 
that every submatrix of ^ of a certain size has to be well-conditioned. More precisely, let 
A C {1, . . . , d} and \I/a be the submatrix of \I/ consisting of the columns indexed by A. 
The local isometry constant <5a = £a(^0 is the smallest number satisfying 

(1-*a)W1< \\*ax\\ 2 2 < (1 + <5 A )H!, (1.1) 

for all coefficient vectors x supported on A. The (global) restricted isometry constant is 
then defined as 

5 S = 6 S (V) := sup <5 A (*), SeN. 

|A|=5 

The matrix \l/ is said to satisfy a uniform uncertainty principle if it has small restricted 
isometry constants, say <5s , ( 1 J r ) < 1/2. Based on this concept, Candes, Romberg and Tao 
proved the following recovery theorem for BP in [4, Theorem 1]. 

Theorem 1.1. Assume that \I/ satisfies 

5 3S (*)+35 4S (V) <2 

for some 5 6N. Let x be an S-sparse vector and assume we are given noisy data y = *&x+(, 
with || £|| 2 < rj. Then the solution to the problem (Pi) satisfies 

\\x* - x\\ 2 < Crj. (1.2) 

The constant C depends only on 5^s and 84s- If b~4S < 1/3 then C < 15.41. 

In particular, if no noise is present, i.e., r/ = 0, then under the stated condition BP 
recovers x exactly. Note that a slight variation of the above theorem holds also in the case 
that x is not sparse in a strict sense, but can be well-approximated by an S-sparse vector 
[H Theorem 2]. 

Of course, the above theorem is only useful if there are matrices \l/ satisfying the uniform 
uncertainty principle. So far no deterministic construction is known (for a reasonably 



Table 1: Greedy Algorithms 
Goal: reconstruct x from s = *&x 

columns of ^ denoted by ipj, Sl/j^ pseudo- inverse of \I/a 



OMP 


Thresholding 


initialise: z = 0, r = s, A = 

find: i = arg maxj (r, ipj ) 

update: A = A U {i}, r = s - *a* a s 

iterate until stopping criterion is attained 

output: x = *a s 


find: A that contains the indices 

corresponding to the S largest 
values of \(s,ipj)\ 

output: x = \l> A s 
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small ratio n/S). However, annxrf random matrix with entries drawn from a standard 
Gaussian distribution (or some other distribution showing certain concentration properties, 
see below) will have small restricted isometry constants 5s with 'overwhelming probability' 
as long as 

n = 0(Slog(d/S)), (1.3) 

see [3 H 13 123 for details. 

The results for OMP in compressed sensing are weaker than for BP. While it can 
again be shown that with high probability a signal can be reconstructed from the random 
measurements ^Sfx if n > CSlogd, this result is no longer uniform in the sense that no 
single measurement matrix \l/ will simultaneously work for all possible sparse signals, see 

As already announced we want to address the question whether the techniques described 
above can be extended to signals y that are not sparse in an ONB but rather in a redundant 
dictionary G W ixK with K > d. So now y = <&x, where x has only few non-zero 
components. Again the goal is to reconstruct y from few measurements. More formally, 
given a suitable measurement matrix A G M raxrf we want to recover y from s = Ay = A&x. 
The key idea then is to use the sparse representation in 3> to drive the reconstruction 
procedure, i.e., try to identify the sparse coefficient sequence x and from that reconstruct 
y. Clearly, we may represent s = *&x with 

* = A* G R nxK . 

In particular, we can apply all of the reconstruction methods described above by using this 
particular matrix Of course, the remaining question is whether for a fixed dictionary 
<1> G M. dxK one can find a suitable matrix A G M. nxd such that the composed matrix 
\l/ = A$> allows for reconstruction of vectors having only a small number of non-zero 
entries. Again the strategy is to choose a random matrix A, for instance with independent 
standard Gaussian entries, and investigate under which conditions on n and S recovery 
is successful with high probability. 

Note that already Donoho considered extensions from orthonormal bases to (redundant) 
tight frames <& in [§]. There it is assumed that the analysis coefficients x' = &*y = &*$>x 
are sparse. For redundant frames, however, this assumption does not seem very realistic 
as even for sparse vectors x the coefficient vector x' = $>*&x is usually fully populated. 

In the following section we will investigate under which conditions on the deterministic 
dictionary 3> its combination with a random measurement matrix will have small isometry 
constants. By Theorem 11.11 this determines how many measurements n will be typically 
required for BP to succeed in reconstructing all signals of sparsity S with respect to the 
given dictionary. In Section [3] we will analyse the performance of thresholding, which 
actually has not yet been considered as a reconstruction algorithm in compressed sensing 
because of its simplicity and hence resulting limitations. The last section is dedicated to 
numerical simulations showing the performance of compressed sensing for dictionaries in 
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practice and comparing it to the situation where sparsity is induced by an ONB. Even 
though we have not yet been able to theoretically analyse OMP for compressed sensing we 
will do simulations for all three approaches. 

2 Isometry Constants for 

In order to determine the isometry constants for a matrix of the type \1/ = A$>, where A 
is an n x d measurement matrix and <E» is a d x K dictionary, we will follow the approach 
taken in [2 J , which was inspired by proofs for the Johnson-Lindenstrauss lemma [T] . We will 
not discuss this connection further but use as starting point concentration of measure for 
random variables. This describes the phenomenon that in high dimensions the probability 
mass of certain random variables concentrates strongly around their expectation. 
In the following we will assume that A is an n x d random matrix that satisfies 

F(\\\Av\\ 2 -\\v\\ 2 \>e\\v\\ 2 ) <2e~ c % £ \ £€(0,1/3) (2.1) 

for all v £ M. d and some constant c > 0. Let us list some examples of random matrices that 
satisfy the above condition. 

• Gaussian ensemble: If the entries of A are independent normal variables with 
mean zero and variance n _1 then 

n\\\Av\\ 2 - \\v\\ 2 \ >e\\v\\ 2 ) < 2e -f(^ 2 /2- £ 3 /3) ) e€(0,l), (2.2) 
see e.g. [HE]. In particular, ([21]) holds with c = 1/2- 1/9 = 7/18. 

• Bernoulli ensemble: Choose the entries of A as independent realizations of ±l/y/n 
random variables. Then again (|2.2[) is valid, see [TJ[2]. In particular (|2.ip holds with 
c = 7/18. 

• Isotropic subgaussian ensembles: In generalization of the two examples above, 
we can choose the rows of A as -^-scaled independent copies of a random vector 

Y £ M d that satisfies E|(Y, v)\ 2 = \\v\\ 2 for all v G M. d and has subgaussian tail 
behaviour. See [T71 eq. (3.2)] for details. 

• Basis transformation: If we take any valid random matrix A and a (deterministic) 
orthogonal d x d matrix U then it is easy to see that also AU satisfies the concentration 
inequality (|2.1|) . In particular, this applies to the Bernoulli ensemble although in 
general AU and A have different probability distributions. 

Using the concentration inequality (|2.ip we can now investigate the local and subse- 
quently the global restricted isometry constants of the n x K matrix A*&. 
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Lemma 2.1. Let A be a random matrix of size nxd drawn from a distribution that satisfies 
the concentration inequality (|2.ip . Extract from the dx K dictionary any sub- dictionary 
<&A of size S, i.e., |A| = S with (local) isometry constant 5 A = <5a(3?). For < 5 < 1 we 
set 

u:=5 A + S + 5 a 5. (2.3) 

Then 

(1 - ^)||x|| 2 < p* A x|| 2 < ||x|| 2 (l + v) (2.4) 

with probability exceeding 

l_ 2 (l + ^Vi^. (2.5) 

Proof: First we choose a finite ei-covering of the unit sphere in M 5 , i.e., a set of points 
Q, with = 1 for all q G Q, such that for all = 1 

min ||x — q\\ < E\ 

for some s\ G (0,1). According to Lemma 2.2 in [17] there exists such a Q with \Q\ < 
(1 + 2/ei) s . Applying the measure concentration in (|2.ip with e 2 < 1/3 to all the points 
&\q and taking the union bound we get 

(l-e 2 )\\® A q\\ 2 <\\A$ A q\\ 2 <(l+e 2 )\\*Aq\\ 2 for all q G Q, (2.6) 

with probability larger than 

1_2M+!^ e ~ cn£ 2. 
Define v as the smallest number such that 

H^^A^II 2 < (1 + ^IMI 2 ) f° r a h x supported on A. (2-7) 

Now we estimate v in terms of £i,£2- We know that for all x with ||a;|| = 1 we can choose 
a q such that \\x — q\\ < E\ and get 

||A* A z|| < ||A*A9|| + ||A* A (x-g)|| 

< (1 + fir 2 )3||* Aff ]l + ||A* A (a5 — g)|| 

< (1 +£2)^(1 + «5a)2 +(l + I/)5e!. 



Since v is the smallest possible constant for which (|2.7p holds it also has to satisfy 

Vl + f < \Zl + £2\/l + *A + eiVl + v. 
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Simplifying the above equation yields 

Now we choose E\ = 5/6 and E2 = 5/3 < 1/3. Then 

l+e 2 1 + (5/3 1 + 5/3 1 + 5/3 , 25/3 1 . 

< r— = 1 + - — +— < 1 + 5. 



(1-ei) 2 (1-5/6) 2 l- 5/3 + 5 2 /36 1 - 5/3 1 - 5/3 

Thus, 

v < 5 + 5 A (l + 5). 

To get the lower bound we operate in a similar fashion. 

\\A$ A x\\ > \\A* A q\\ - \\A$ A (x - q)\\ 

> (l-e a )*(l-<$ A )* -(1 + i/)5ei. 

Now square both sides and observe that v < 1 (otherwise we have nothing to show). Then 
we finally arrive at 

\\A& A x\\ 2 > ((l- £2 )§(l-5 A ) 1/2 -£iV2) 2 

> (1 - e 2 )(l - *a) - 2eiv / 2v / T 3 ^V / l - 5 A + 2e\ 

> l-5 A -e 2 -2eiv / 2 > 1 - 5 A - 5 > 1 - i/. 

This completes the proof. □ 

Note that the choice of E\ and e 2 in the previous proof is not the only one possible. 
While our choice has the advantage of resulting in an appealing form of v in (|2.3|) , others 
might actually yield better constants. 

Based on the previous theorem it is easy to derive an estimation of the global restricted 
isometry constants of the composed matrix \l/ = A&. 

Theorem 2.2. Let 3> G M. dxK be a dictionary of size K in M. d with restricted isometry 
constant 5s (3>), S G N. Let A G W nxd be a random matrix satisfying \2.1\) and assume 

n > C5~ 2 (S \ag(K/S) + log(2e(l + 12/5)) + t) (2.8) 

for some 5 G (0, 1) and t > 0. Then with probability at least 1 — e~ t the composed matrix 
\I/ = A$> has restricted isometry constant 

S S (A&) < <fe(*) + 5(1 + 5 5 (*)). (2.9) 

The constant satisfies C < 9/c. 
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Proof: By Lemma 12.11 we can estimate the probability that a sub-dictionary \I/a = 
(A4>)a = A<f>A, A C {1, . . . ,K} fails to have (local) isometry constants 5a(*) < <5a(*) + 
5 + 5a(&)$ by 

P(5a(*) > <5a(*) + ^ + <5 A (*)<5) < 2(1 + ^)Vf 52n . 

By taking the union bound over all (^) possible sub-dictionaries of size S we can estimate 
the probability of 5s(*&) = sup Ac | 1 ,|A|=S^a(^) n °t satisfying (|2.9|) by 

p(M*) > $s(*) + s(i + <fe(*))) < 2 (5) ( 1 + f) S e ~ l52n - 

Using (^) < (eK/S) s (Stirling's formula) and requiring that the above term is less than 
e~* shows the claim. □ 



Note that for fixed 5 and t condition (|2.8j) can be expressed in the more compact form 

n > CS\og(K/S). 

Moreover, if the dictionary <& is an orthonormal basis then 5(<f>) = and we recover 
essentially the previously known estimates of the isometry constants for a random matrix 
A, see e.g. [2j Theorem 5.2]. 

Now that we have established how the isometry constants of a deterministic dictionary 
<1? are affected by multiplication with a random measurement matrix, we only need some 
more initial information about before we can finally apply the result to compressed 
sensing of signals that are sparse in The following little lemma gives a very crude 
estimate of the isometry constants of $ in terms of its coherence [i or Babel function 
fii(k), which are defined as 

fi:= max \(<Pi,<Pj)\, m(k) := max V \((p u tpj) |. (2.10) 

Lemma 2.3. For a dictionary with coherence and Babel function H\{k) we can bound 
the restricted isometry constants by 

5s<Vi{S-l)<{S-l)n. (2.11) 

Proof: Essentially this can be derived from the proof of Lemma 2.3 in [21j . □ 

Combining this Lemma with Theorem 12.21 provides the following estimate of the isom- 
etry constants of the composed matrix \I/ = 
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Corollary 2.4. Let G ]R rfx ^ 6e a dictionary with coherence \i. Assume that 

S-l<^-\ (2.12) 

Let A £ M nxrf 6e a random matrix satisfying \2.1\) . Assume that 

n>C 1 {S\og(K/S) + C 2 + t). 

Then with probability at least 1 — e i/ie composed matrix A& has restricted isometry 
constant 

5 S (V) < 1/3. (2.13) 

The constants satisfy C\ < 138.51 c -1 and Ci < log(1250/13) + 1 ~ 5.57. In particular, 
for the Gaussian and Bernoulli ensemble C\ < 356.18. 

Proof: By Lemma 12.31 the restricted isometry constant of satisfies 

< (S-l)n< 1/16. 

Hence, choosing 5 = 13/(3 • 17) yields 

1 13 1 

8(A*) < 5 S (*) + 6(1 + 8 s m <- + — (1 + -) = 1/3. 

Plugging this particular choice of 5 into Theorem 12.21 yields the assertion. □ 

Of course, the numbers 1/16 and 1/3 in (|2.12p and (|2,13p were just arbitrarily chosen. 
Other choices will only result in different constants Ci,C%. Combining the previous result 
with Theorem 11.11 yields a result on stable recovery by Basis Pursuit of sparse signals 
in a redundant dictionary. We leave the straightforward task of formulating the precise 
statement to the interested reader. We just want to point out that this recovery result is 
uniform in the sense that a single matrix A can ensure recovery of all sparse signals. 

The constants C\ and C2 of Corollary 12.41 are probably not optimal. In the case of a 
Gaussian ensemble A and an orthonormal basis $ recovery conditions for BP with quite 
small constants were obtained in [20] and precise asymptotic results can be found in [IQj . 
One might raise the objection that the condition S — 1 < j^- in Corollary 12.41 is too weak 
for practial applications. A lower bound on the coherence in terms of the dictionary size is 



H > 



I K-d 
d{K - 1) 



and for reasonable dictionaries we can usually expect the coherence to be of the order 
fi ~ C/Vd. The restriction on the sparsity thus is S < \fdjC. However, compressed 
sensing is only useful if indeed the sparsity is rather small compared to the dimension d, so 
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this restriction is actually not severe. Moreover, if it is already impossible to recover the 
support from complete information on the original signal we cannot to expect to do this 
with even less information. 

To illustrate the theorem let us have a look at an example where the dictionary is the 
union of two ONBs. 

Example 2.5 (Dirac-DCT). Assume that our dictionary is the union of the Dirac and the 
Discrete Cosine Transform bases in M. d for d = 2 2p+1 . The coherence in this case is u = 
y/2/d = 2~ p and the number of atoms K = 2 2p+2 . If we assume the sparsity of the signal 
to be smaller than 2 p ~ e we get the following crude estimate for the number of necessary 
samples to have 54s(A*&) < 1/3 as recommended for recovery by BP in Theorem \l.l\ 

n > Ci(4S(2plog 2 - log S) + C 2 + t) 

with the constants C\ ~ 138.51 c -1 and C2 ~ 5.57 from Corollary \2.4\ 

In comparison if the signal is sparse in just the Dirac basis we can estimate the necessary 
number of samples to have £45 (A) < 1/3 with Theorem \2.2\ as 

n>C[ (4S{2p log 2 - log 2S) +C 2 +t) 

with C{ = (jf) 2 Ci and C' 2 ~ 5.3, thus implying an improvement of roughly the factor 
(S) 2 «l-71. 

3 Recovery by Thresholding 

In this section we investigate recovery from random measurements by thresholding. Since 
thresholding works by comparing inner products of the signal with the atoms an essential 
ingredient will be stability of inner products under multiplication with a random matrix 
A, i.e., 

(Ax, Ay) Rj (x,y). 

The exact result that we will use is summarised in the following lemma. 

Lemma 3.1. Let x, y G M. d with ||x||2, \\yW2 < 1- Assume that A is an nxd random matrix 
with independent M(0, n _1 ) entries (independent of x,y). Then for all t > 

F(\(Ax,Ay) - (x,y)\ >t)< 2exp ^-n ^-^— ) (3.1) 
with Ci=4=^ 2.5044 and C 2 = V8e 7.6885. 

V D7T 

The analogue statement holds for a random matrix A with independent ±l/y/n Bernoulli 
entries. In this case the constants are C\ = — J= ~ 1.2522 and C2 = 2e ~ 5.4366. 
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Note that taking x = y in the lemma provides the concentration inequality (|2.ip for 
Gaussian and Bernoulli matrices (with non-optimal constants however). 

The proof of the lemma is rather technical and therefore safely locked away in Ap- 
pendix [A] awaiting inspection by the genuinely interested reader there. However armed 
with it, we can now investigate the stability of recovery via thresholding. 

Theorem 3.2. Let be a dxK dictionary. Assume that the support x of a signal y = Qx, 
normalised to have \\y\\2 = 1, could be recovered by thresholding with a margin e, i.e., 

rain I (y, pi) I > max|(y,p fc )| + e. 
«eA keA 

Let A be annxd random matrix satisfying one of the two probability models of the previous 
lemma. Then with probability exceeding 1 — e~* the support and thus the signal can be 
reconstructed via thresholding from the n-dimensional measurement vector z = Ay = AQx 
as long as 

n> C{s) (log (2K)+t). 
where C(e) = 4Ci£ _2 +2C2£ _1 andC\,C2 are the constants from Lemma UTli In particular, 

C(e) < C 3 e" 2 . 

with C3 < 4Ci + 2C2 < 25.40 for the Gaussian case and C3 < 15.89 in the Bernoulli case. 
Proof: Thresholding will succeed if we have 

mm\{Ay,Aipi)\ > max | {Ay, A(p k ) | . 
*6 A fceA 

So let us estimate the probability that the above inequality does not hold, 

F(mm\{Ay,A<Pi)\ < max \{Ay, A<p k )\) 
,j eA fceA 

<P(wm\{Ay,A(pi)\ < mm\{y,ipi}\ - J) +F{max\{Ay, Aip k )\ > max\{y,(p k )\ + ~) 
ieA «gA z k&A fceA * 

The probability of the good components having responses lower than the threshold can be 
further estimated as 

PC^K^A^I^minK^^I-lj^pflJlKAy,^)! < |(y,Pi)|-|} 

VieA 

< gp(|<2/ >¥ >i> - (Ay,A<pi)\ >£)< 2|A|exp {-n ^ ^f g/2 ) ■ 
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Similarly we can bound the probability of the bad components being higher than the 
threshold, 



V(max\{Ay,Atp k )\ > max <^}| + ~) <F(\J{\(Ay,Acp k )\ > \(y,<p k }\ + ^}) 
keA keA 



keA 



< ^n\(Ay,Aip k ) - {y,<p k )\ > £) < 2|A|exp f-n - ^Jft ) ' 

Combining these two estimates we see that the probability of success for thresholding is 
exceeding 

( e 2 /4 
1 -2-fiT exp f-n- 1 



Ci + C 2 e/2 



The lemma finally follows from requiring this probability to be higher than 1 — e t and 
solving for n. □ 

The result above may appear surprising because the number of measurements seems 
to be independent of the sparsity. The dependence, however, is quite well hidden in the 
margin e and the normalization \\y\\2 = 1- For clarification we will estimate e given the 
coefficients and the coherence of the dictionary. 

Corollary 3.3. Let $ be an d x K dictionary with Babel function \i\ defined in A2.10\) . 
Assume a signal y = <&ax with |A| = S satisfies the sufficient recovery condition for 
thresholding, 



x 



mm 



>Ml(S)+Ml(S-l), ( 3 - 2 ) 

ll* lloo 

where \x m i n \ = minj g A \%i\- If A is annxd random matrix according to one of the probability 
models in Lemma \3.1\ then with probability at least 1 — e~ l thresholding can recover x (and 
hence y ) from s = Ay = AQx as long as 

n > C 3 5(l + m(S - 1)) ( - - fii(S - 1)) 2 (log(2i0 + t). (3.3) 



Here, C3 is the constant from Theorem \3.%\ 

In the special case that the dictionary is an ONB the signal always satisfies the recovery 
condition and the bound for the necessary number of samples reduces to 

n > C 3 S rJi^kV (log(2if ) + t). (3.4) 
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Proof: The best possible value for e in Theorem 13.21 is quite obviously 

e = imn\(y/\\y\\2,(pi)\-max\(y/\\y\\2,(pk)\ 

= TTir(\ m ^y2 x j( l Pj^ ( Pi)\ -maxlVx^^-,^)! 

> -ji — 7T— (|x min | - ||x|| 00 ^l(S' - 1) - WxWoofl^S)) . 

Therefore, we can bound the factor C{e) in Theorem 13.21 as 

C(e) < C 3 e- 2 < C 3 ^§- • (jpfl - MS) - MS ~ l)) -2 . 



To get to the final estimate observe that by Lemma [27 

ll.'/ll] ||* A x[|i < (1 + m(5 _ 1)} W <(1 + M1 (5-1))5. 



II 119 II 119 — V 1 rH" n ||9 

W W oo ll lloo ll^lloo 

The case of an ONB simply follows from Hi(S) = 0. □ 

The previous results tell us that as for BP we can choose the number n of samples linear 
in the sparsity S. However, for thresholding successful recovery additionally depends on the 
ratio of the largest to the smallest coefficient. Also, in contrast to BP the result is no longer 
uniform, meaning that the stated success probability is only valid for the given signal x. 
It does not imply that a single matrix A can ensure recovery for all sparse signals. Indeed, 
in the case of a Gaussian matrix A and an orthonormal basis $ it is known that once A 
is randomly chosen then with high probability there exists a sparse signal x (depending 
on A) such that thresholding fails on x unless the number of samples n is quadratic in the 
sparsity S, see e.g. [HI Section 7]. This fact seems to generalise to redundant 

Example 3.4 (Dirac-DCT). Assume again that our dictionary is the union of the Dirac 
and the Discrete Cosine Transform bases in M. d for d = 2 2p+1 . The coherence is again 
fi = 2~ p and the number of atoms K = 2 2p+1 . If we assume the sparsity S < 2 P ~ 2 and 
balanced coefficients, i.e., \xi\ = 1, we get the following crude estimate for the number of 
necessary samples 

n>6C 3 S{log{2)(2p + 2) + t). 

If we just allow the use of one of the two ONBs to build the signal, the number of necessary 
samples reduces to 

n>C 3 S(log(2){2p + l)+t). 

Again we see that whenever the sparsity 5 < \fd the results for ONBs and general 
dictionaries are comparable. At this point it would be nice to have a similar result for OMP. 
This task seems rather difficult due to stochastic dependency issues and so, unfortunately, 
we have not been able to do this analysis yet. 
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4 Numerical Simulations 



For our numerical simulations we used the same dictionary as for the examples, i.e., the 
combination of the Dirac and the Discrete Cosine Transform bases in M. d , d = 256, with 
coherence fj, = y^/128 « 0.0884. 

We drew six measurement matrices of size n x d, with n varying between 64 and 224 
in steps of 32, by choosing each entry as independent realisation of a centered Gaussian 
random variable with variance a 2 = n~ l . Then for every sparsity level S, varying between 
4 and 64 in steps of 4, respectively between 2 and 32 in steps of 2 for thresholding, we 
constructed 100 signals. The support A was chosen uniformly at random among all (^) 
possible supports of the given sparsity S. For BP and OMP the coefficients (xj)j 6 A of 
the corresponding entries were drawn from a normalised standard Gaussian distribution 
while for thresholding we chose them of absolute value one with random signs. Then 
for each of the algorithms we counted how often the correct support could be recovered. 
For comparison the same setup was repeated replacing the dictionary with the canonical 
(Dirac) basis. The results are displayed in Figures [U [2] and El 




10 20 30 40 50 60 10 20 30 40 50 60 

support size support size 



Figure 1: Recovery Rates for BP as a Function of the Support and Sample Sizes 




5 10 15 20 25 30 5 10 15 20 25 30 

support size support size 



Figure 2: Recovery Rates for Thresholding as a Function of the Support and Sample Sizes 

As predicted by the theorems the necessary number of measurements is higher if the 
sparsity inducing dictionary is not an ONB. If we compare the three recovery schemes we 
see that thresholding gives the weakest results as expected. However, the improvement 
in performance of BP over OMP is not that significant. This is especially interesting 
considering that in practice BP is a lot more computationally intensive than OMP. 
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Figure 3: Recovery Rates for OMP as a Function of the Support and Sample Sizes 



5 Conclusions &; Future Work 

We have shown that compressed sensing can also be applied to signals that are sparse 
in a redundant dictionary. The spirit is that whenever the support can be reconstructed 
from the signal itself it can also be reconstructed from a small number of random samples 
with high probability. We have shown that this kind of stability is valid for reconstruction 
by Basis Pursuit as well as for the simple thresholding algorithm. Thresholding has the 
advantage of being much faster and easier to implement than BP. However, it has the 
slight drawback that the number of required samples depends on the ratio of the largest to 
the smallest coefficient, and recovery is only guaranteed with high probability for a given 
signal and not uniformly for all signals in contrast to BP. Furthermore, there is numerical 
evidence that Orthogonal Matching Pursuit also works well. In particular, it is still faster 
than BP and the required number of samples does not seem to depend on the ratio of the 
largest to the smallest coefficient. 

For the future there remains plenty of work to do. First of all we would like to have 
a recovery theorem for OMP comparable to Theorem 13.21 However, since in the course of 
iterating the updated residuals become stochastically dependent on the random matrix A 
this task does not seem to be straightforward. In particular, the technique developed in 
[13] cannot be applied directly. Then we would like to investigate for which dictionaries it 
is possible to replace the random Gaussian/Bernoulli matrix by a random Fourier matrix, 
see also |19j . This would have the advantage that the Fast Fourier Transform can be used 
in the algorithms in order to speed up the reconstruction. Finally, it would be interesting 
to relax the incoherence assumption on the dictionary. 

A Proof of Lemma 13.11 

Our proof uses the following inequality due to Bennett (also refered to as Bernstein's 
inequality) [3j eq. (7)], see also [22], Lemma 2.2.11]. 

Theorem A.l. Let X\, . . . ,X n be independent random variables with zero mean such that 

E\Xi\ q < q\M q - 2 Vi /2 (A.l) 
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for every m > 2 and some constants M and Vi, i = 1, . . . , n. Then for x > 



\ i=l / 

Now let us prove Lemma 13. II Observe that 

^ n d d 

(Ax, Ay) = - ^2 ^2 ^2 gi k gejx k yj 
i=i k=i j=i 

where gn~, £ = l,...,n,k = 1, . . . , d are independent standard Gaussians. We define the 
random variable 

d 

Y := ^2 9k9jXkVj 
k,j=l 

where again the gk, k = 1, . . . , d are independent standard Gaussians. Then we can write 



1 n 

(Ax, Ay) = -J2 Y t 



n 
i=i 

where the Yn are independent copies of Y. 

Let us investigate Y. The expectation of Y is easily calculated as 

d 

EY = ^2x k y k = (x,y). 

k=l 

Hence, also E [{Ax, Ay)] = (x,y). Now let 

Z := Y - EY = ^2gjg k XjX k + - ^)x k y k - 

k^j k 

The random variable Z is known as Gaussian chaos of order 2. 

Thus, we have to show the moment bound (IA.1D for the random variable Z. Note that 
EZ = 0. A general bound for Gaussian chaos (see |X5|, p. 65]) gives 

E\Z\ q < (q - l) q (E\Z\ 2 ) q/2 (A.2) 

for all q > 2. Using Stirling's formula, ql = ^fZTuiq q e~ q e Rq , j^+T — — TIq> we further 
obtain 

/-HJ.|<7|2\9/2 _ n A IV 1 jHW7l2\(«- 2 )/ 2 ffl7l2 



E \Z\ q = ql ^-=±> (E\Z\ 2 ) ql ' = e 2 1 - - R , ql (e 2 E\Z\ 2 ) {q ~ Z,/Z E\Z 

1 1 e^^Hqe-iqi v 1 1 ; \ QJ e R «\finq v ; 

< g! f e(E|Z| 2 ) 1 / 2 V^ 2 -^=E|Z| 2 for all q > 3. 
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Hence, the moment bound (|A.1|) holds for all q > 3 with 

|2 >.l/2 2e 



M = e{E\Z\ 2 Y /z , 



6ir 



E\Z\ 



and by direct inspection it then also holds for q = 2. So let us determine E\Z\ 2 . Using 
independence of the g k we obtain 



E\Z\ 



E 



9j9kgjigk'XjykXjiyki + 2 s ^ s ^g j g k (gl, - l)xjy k x k ,y k i 
j^kj'^k' j+k k> 



k h> 



^E[ 5 |]E[^] x 2 y 2 + Y^E[{g 2 k - lf}xlyl 

k^j k 
k^j k 



(A.3) 



since by assumption ||x||2,||j/||2 < 1- Denoting by Z^ i = 1, . . . , n independent copies of 
Z, Theorem I A. II yields 



'(\{Ax,Ay) - (x,y)\ >t) 



Z e ] >nt\< 2e~2nv+nMt = 2e n Ci+c 2 t 



with d = -^=E\Z\ 2 <4§=^ 2.5044 and C 2 = 2eV2 « 7.6885. 

V07T V07T 



For the case of Bernoulli random matrices the proof is completely analogue. We just 
have to replace the standard Gaussians g k by ±1 Bernoulli variables. In particular, the esti- 
mate (|A.2|) for the chaos variable Z is still valid, see [15\ p. 105]. Furthermore, for Bernoulli 
variables g k we clearly have E\g^\ = 1 and E[(g| — 1)] = 0. Hence, the corresponding esti- 
mate in (|A.3|) yields E|Z| 2 < 1, and we end up with the constants C\ = -j= ~ 1.2522 and 
C 2 = 2e = 5.4366. 
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